AITopics | out-of-sample extension

Collaborating Authors

out-of-sample extension

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Guided Manifold Alignment with Geometry-Regularized Twin Autoencoders

Rhodes, Jake S., Rustad, Adam G., Nielsen, Marshall S., McClellan, Morgan Chase, Gardner, Dallan, Hedges, Dawson

arXiv.org Machine LearningSep-30-2025

Abstract--Manifold alignment (MA) involves a set of techniques for learning shared representations across domains, yet many traditional MA methods are incapable of performing out-of-sample extension, limiting their real-world applicability. We propose a guided representation learning framework leveraging a geometry-regularized twin autoencoder (AE) architecture to enhance MA while enabling generalization to unseen data. Our method enforces structured cross-modal mappings to maintain geometric fidelity in learned embeddings. By incorporating a pre-trained alignment model and a multitask learning formulation, we improve cross-domain generalization and representation robustness while maintaining alignment fidelity. We evaluate our approach using several MA methods, showing improvements in embedding consistency, information preservation, and cross-domain transfer . Additionally, we apply our framework to Alzheimer's disease diagnosis, demonstrating its ability to integrate multi-modal patient data and enhance predictive accuracy in cases limited to a single domain by leveraging insights from the multi-modal problem. Manifold learning encompasses a set of methods used to create a lower-dimensional representation, or an embedding, of higher-dimensional data. Such representations can form a key role in data visualization [1]-[5], dimensionality reduction as a preprocessing step for subsequent machine-learning or analytical tasks [6], or serve as a denoising mechanism [4]. In the context of multi-domain problems, where multiple types of data are considered, manifold learning becomes more challenging as data distributions across different domains or modalities may exhibit domain-specific variations while still sharing a common geometric structure. Manifold alignment (MA) seeks to address this problem. In some contexts, a common, shared representation of multi-modal data can be viewed as a natural extension of manifold learning. For example, cell samples of the same type but collected at a different time or using different methodologies should still share features in common, but differences in the measured features may occur due to batch effects [7], obscuring the similarities.

alignment, correspondence, representation, (14 more...)

arXiv.org Machine Learning

2509.22913

Country:

North America > United States > California (0.14)
North America > United States > Utah > Utah County > Provo (0.05)
Europe > Switzerland (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Out-of-Core Dimensionality Reduction for Large Data via Out-of-Sample Extensions

Reichmann, Luca, Hägele, David, Weiskopf, Daniel

arXiv.org Artificial IntelligenceAug-7-2024

Dimensionality reduction (DR) is a well-established approach for the visualization of high-dimensional data sets. While DR methods are often applied to typical DR benchmark data sets in the literature, they might suffer from high runtime complexity and memory requirements, making them unsuitable for large data visualization especially in environments outside of high-performance computing. To perform DR on large data sets, we propose the use of out-of-sample extensions. Such extensions allow inserting new data into existing projections, which we leverage to iteratively project data into a reference projection that consists only of a small manageable subset. This process makes it possible to perform DR out-of-core on large data, which would otherwise not be possible due to memory and runtime limitations. For metric multidimensional scaling (MDS), we contribute an implementation with out-of-sample projection capability since typical software libraries do not support it. We provide an evaluation of the projection quality of five common DR algorithms (MDS, PCA, t-SNE, UMAP, and autoencoders) using quality metrics from the literature and analyze the trade-off between the size of the reference set and projection quality. The runtime behavior of the algorithms is also quantified with respect to reference set size, out-of-sample batch size, and dimensionality of the data sets. Furthermore, we compare the out-of-sample approach to other recently introduced DR methods, such as PaCMAP and TriMAP, which claim to handle larger data sets than traditional approaches. To showcase the usefulness of DR on this large scale, we contribute a use case where we analyze ensembles of streamlines amounting to one billion projected instances.

out-of-core dimensionality reduction, out-of-sample extension

arXiv.org Artificial Intelligence

2408.04129

Genre: Research Report (0.69)

Technology:

Information Technology > Data Science > Data Mining (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.60)

Add feedback

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

Neural Information Processing SystemsApr-6-2023, 16:11:53 GMT

Several unsupervised learning algorithms based on an eigendecompo- sition provide either an embedding or a clustering only for given train- ing points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides a unified frame- work for extending Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling (for dimensionality reduction) as well as for Spectral Clustering. This framework is based on seeing these algorithms as learning eigenfunctions of a data-dependent kernel. Numerical experiments show that the generalizations performed have a level of error comparable to the variability of the embedding algorithms due to the choice of training data.

eigenmap, out-of-sample extension, spectral clustering, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data Visualization

Sun, Yan, Han, Yi, Fan, Jicong

arXiv.org Artificial IntelligenceOct-24-2022

Dimensionality reduction techniques aim at representing high-dimensional data in low-dimensional spaces to extract hidden and useful information or facilitate visual understanding and interpretation of the data. However, few of them take into consideration the potential cluster information contained implicitly in the high-dimensional data. In this paper, we propose LaptSNE, a new graph-layout nonlinear dimensionality reduction method based on t-SNE, one of the best techniques for visualizing high-dimensional data as 2D scatter plots. Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding when learning to preserve the local and global structure from high-dimensional space to low-dimensional space. It is nontrivial to solve the proposed model because the eigenvalues of normalized symmetric Laplacian are functions of the decision variable. We provide a majorization-minimization algorithm with convergence guarantee to solve the optimization problem of LaptSNE and show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective. We evaluate our method by a formal comparison with state-of-the-art methods on seven benchmark datasets, both visually and via established quantitative measurements. The results demonstrate the superiority of our method over baselines such as t-SNE and UMAP. We also provide out-of-sample extension, large-scale extension and mini-batch extension for our LaptSNE to facilitate dimensionality reduction in various scenarios.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2207.12214

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Tensor-based Multi-view Spectral Clustering via Shared Latent Space

Tao, Qinghua, Tonin, Francesco, Patrinos, Panagiotis, Suykens, Johan A. K.

arXiv.org Artificial IntelligenceJul-23-2022

Multi-view Spectral Clustering (MvSC) attracts increasing attention due to diverse data sources. However, most existing works are prohibited in out-of-sample predictions and overlook model interpretability and exploration of clustering results. In this paper, a new method for MvSC is proposed via a shared latent space from the Restricted Kernel Machine framework. Through the lens of conjugate feature duality, we cast the weighted kernel principal component analysis problem for MvSC and develop a modified weighted conjugate feature duality to formulate dual variables. In our method, the dual variables, playing the role of hidden features, are shared by all views to construct a common latent space, coupling the views by learning projections from view-specific spaces. Such single latent space promotes well-separated clusters and provides straightforward data exploration, facilitating visualization and interpretation. Our method requires only a single eigendecomposition, whose dimension is independent of the number of views. To boost higher-order correlations, tensor-based modelling is introduced without increasing computational complexity. Our method can be flexibly applied with out-of-sample extensions, enabling greatly improved efficiency for large-scale data with fixed-size kernel schemes. Numerical experiments verify that our method is effective regarding accuracy, efficiency, and interpretability, showing a sharp eigenvalue decay and distinct latent variable distributions.

dataset, latent space, spectral, (14 more...)

arXiv.org Artificial Intelligence

2207.11559

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)

Add feedback

Positive semi-definite embedding for dimensionality reduction and out-of-sample extensions

Fanuel, Michaël, Aspeel, Antoine, Delvenne, Jean-Charles, Suykens, Johan A. K.

arXiv.org Machine LearningOct-6-2020

In machine learning or statistics, it is often desirable to reduce the dimensionality of a sample of data points in a high dimensional space $\mathbb{R}^d$. This paper introduces a dimensionality reduction method where the embedding coordinates are the eigenvectors of a positive semi-definite kernel obtained as the solution of an infinite dimensional analogue of a semi-definite program. This embedding is adaptive and non-linear. A main feature of our approach is the existence of a non-linear out-of-sample extension formula of the embedding coordinates, called a projected Nystr\"om approximation. This extrapolation formula yields an extension of the kernel matrix to a data-dependent Mercer kernel function. Our empirical results indicate that this embedding method is more robust with respect to the influence of outliers, compared with a spectral embedding method.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

1711.07271

Country:

North America > United States (0.14)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Europe > Belgium > Wallonia > Walloon Brabant > Louvain-la-Neuve (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)

Add feedback

A perturbation based out-of-sample extension framework

Mitz, Roy, Shkolnisky, Yoel

arXiv.org Machine LearningSep-7-2020

Out-of-sample extension is an important task in various kernel based non-linear dimensionality reduction algorithms. In this paper, we derive a perturbation based extension framework by extending results from classical perturbation theory. We prove that our extension framework generalizes the well-known Nystr{\"o}m method as well as some of its variants. We provide an error analysis for our extension framework, and suggest new forms of extension under this framework that take advantage of the structure of the kernel matrix. We support our theoretical results numerically and demonstrate the advantages of our extension framework both on synthetic and real data.

artificial intelligence, extension, machine learning, (19 more...)

arXiv.org Machine Learning

2009.02955

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Online high rank matrix completion

Fan, Jicong, Udell, Madeleine

arXiv.org Machine LearningFeb-20-2020

Recent advances in matrix completion enable data imputation in full-rank matrices by exploiting low dimensional (nonlinear) latent structure. In this paper, we develop a new model for high rank matrix completion (HRMC), together with batch and online methods to fit the model and out-of-sample extension to complete new data. The method works by (implicitly) mapping the data into a high dimensional polynomial feature space using the kernel trick; importantly, the data occupies a low dimensional subspace in this feature space, even when the original data matrix is of full-rank. We introduce an explicit parametrization of this low dimensional subspace, and an online fitting procedure, to reduce computational complexity compared to the state of the art. The online method can also handle streaming or sequential data and adapt to non-stationary latent structure. We provide guidance on the sampling rate required these methods to succeed. Experimental results on synthetic data and motion capture data validate the performance of the proposed methods.

artificial intelligence, machine learning, matrix completion, (17 more...)

arXiv.org Machine Learning

2002.08934

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Limit theorems for out-of-sample extensions of the adjacency and Laplacian spectral embeddings

Levin, Keith, Roosta, Fred, Tang, Minh, Mahoney, Michael W., Priebe, Carey E.

arXiv.org Machine LearningSep-29-2019

Graph embeddings, a class of dimensionality reduction techniques designed for relational data, have proven useful in exploring and modeling network structure. Most dimensionality reduction methods allow out-of-sample extensions, by which an embedding can be applied to observations not present in the training set. Applied to graphs, the out-of-sample extension problem concerns how to compute the embedding of a vertex that is added to the graph after an embedding has already been computed. In this paper, we consider the out-of-sample extension problem for two graph embedding procedures: the adjacency spectral embedding and the Laplacian spectral embedding. In both cases, we prove that when the underlying graph is generated according to a latent space model called the random dot product graph, which includes the popular stochastic block model as a special case, an out-of-sample extension based on a least-squares objective obeys a central limit theorem about the true latent position of the out-of-sample vertex. In addition, we prove a concentration inequality for the out-of-sample extension of the adjacency spectral embedding based on a maximum-likelihood objective. Our results also yield a convenient framework in which to analyze trade-offs between estimation accuracy and computational expense, which we explore briefly.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Machine Learning

1910.00423

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Oceania > Australia > Queensland (0.04)
North America > United States > North Carolina (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Generalization Properties of hyper-RKHS and its Application to Out-of-Sample Extensions

Liu, Fanghui, Shi, Lei, Huang, Xiaolin, Yang, Jie, Suykens, Johan A. K.

arXiv.org Machine LearningSep-26-2018

Hyper-kernels endowed by hyper-Reproducing Kernel Hilbert Space (hyper-RKHS) formulate the kernel learning task as learning on the space of kernels itself, which provides significant model flexibility for kernel learning with outstanding performance in real-world applications. However, the convergence behavior of these learning algorithms in hyper-RKHS has not been investigated in learning theory. In this paper, we conduct approximation analysis of kernel ridge regression (KRR) and support vector regression (SVR) in this space. To the best of our knowledge, this is the first work to study the approximation performance of regression in hyper-RKHS. For applications, we propose a general kernel learning framework conducted by the introduced two regression models to deal with the out-of-sample extensions problem, i.e., to learn a underlying general kernel from the pre-given kernel/similarity matrix in hyper-RKHS. Experimental results on several benchmark datasets suggest that our methods are able to learn a general kernel function from an arbitrary given kernel matrix.

artificial intelligence, kernel, machine learning, (15 more...)

arXiv.org Machine Learning

1809.0991

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback